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Abstract 

The total measurable level of a pathogen is due to many sources, which 
produce a variety of pulses, overlapping in time, that rise suddenly and then 
decay. 

What is measured is the level of the total contribution of the sources at a 
given time. But since we are only capable of measuring the total level above 
some threshold xo, we would like to predict the distribution below this level. 

Our principal model assumption is that of the asymptotic exponential de- 
cay of all pulses. We show that this implies a power law distribution for the 
frequencies of low amplitude observations. As a consequence, there is a simple 
extrapolation procedure for carrying the data to the region below xq. 

Keywords: exponential decay; power-law distribution; completion of data 

1 Introduction 

Acquiring sufficient data of sufficient accuracy is the standard problem in the use of 
applicable mathematics. Reliance upon null measurements — i.e., an answer of yes or 
no — is often an intelligent way of attending to the latter desideratum, as in the familiar 
limiting dilution assays [Lefkowitz and Waldman, 1979]. But the former frequently 
is controlled by experimental inability, or perhaps excessive expense, in dealing with 
some region of data. If enough is surmised about the structure of the data, such 
regions can be reduced by suitable extrapolation, but the implicit assumption [for an 
elegant presentation, see Berman, 2006] of some sort of analytic structure runs the 
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risk of being too much of a mathematical band-aid unless it is justified by a versatile 
underlying model. 

In this note, we address a situation of some generality. It is that in which an 
organism, biological or mechanical, is continually subjected to transient defects, e.g. 
pathogenic molecular species, internally or externally incited but soon eliminated. 
These inhibit its ability to effectively deal with its environment. We imagine that the 
net pathogen level A is measurable at occasional time intervals, but only if it exceeds 
some threshold x (i.e. A > x ). A null measurement sequence would then give the 
relative frequency G{xq) of measurements falling below the threshold x . We would 
want e.g. to obtain from this the density function p{A) of amplitudes of the pathogen 
aggregate level, A, with particular attention to the unavailable low amplitudes. The 
total pathogen load A at a given measurement would be expected to be the resultant 
of the current amplitudes of each of the sources; these sources may be imagined as 
time-displaced versions of a discrete set of types, and this is the model that we will 
study in detail. The model was originally used in a somewhat different context, that 
of the significance of "blips" in HIV viral level in patients undergoing multi-drug 
therapy, [see Percus et al, 2003] 

What we can adjust in this scenario is the threshold level above x , and then 
observe the null frequency G(x) for x > x . The relationship between the intrinsic 
p(A) and G(x) is obvious 



just the cummulative distribution of A. Our task is now to obtain the form of p(A) 




(1) 
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from the model assumptions and use this e.g. to extrapolate the available G(x) for 
x > x to values < x < xq. 

2 The Underlying Model 




Figure 1: Parameters of Typical Pulse Shape 



We imagine that the arriving pulses are all translations in time of a basic set of 



shapes indexed by A 



F x (t), a<t <b 



(2) 



These shapes are non-negative functions such that 



rb 

/ F\(t) dt is finite, 

J a 



Now, place each of these functions, independently on the interval (— T, T)(— T < a < 
b <T)i>\ times. To do this, let f be a random variable uniformly distributed on the 
interval (— T, T) and i>\ a Possion random variable with mean 2Tq\, i.e. 



h\(v\) = P{i>\ = v\} 



(2Tq x ) 



-2Tq x 



(3) 
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The location in time of F\ is determined, for example, so that its maximum is at the 
origin (see Fig. 1). The time coordinate of the maximum point of the i th occurrence 
of Fx is then denoted by f\ i i = 1, . . . , v\. 

The equation of the i th occurrence of the curve F\ is then 

A Xi =F x (t-f Xi ). 

The total amplitude at any specified time, say t = 0, is 

i=EE^Ki). (4) 

A j=l 

We would like to find the probability density of the random variable A. 

Let p(A) be the probability density function of the random variable A i.e. 

p(A) = {8/dA)Pr (i < A) (5) 

We will assume a steady state distribution of "pathogens" in the course of measure- 
ments. This is a limitation of our approach: often the life-time of the organism may 
be comparable to the "decay" of pathogen. Then the system is translation-invariant 
in time, which is why we can choose, without loss of generality, the observation time 
t = 0, as in (0}. 

Let us construct the generating function for p(A) 



) = E (e' aA ) (6) 



Then 

r e - aA p(A)dA. (7) 
J o 



w [a] 



We need 



E ( e -aF xi -r Xj) \ 1 [ T e -«F x{T ) dr 



so that 



w (a) 



But from ©, F (Y^) = e^A^-i), an d we see at once that 



(9) 



ri ex p 



(10) 



or letting T — > oo, 



/ e~ aA p(A) dA = exp ]T g A / (e~ aF ^ r) - l) rfr 

«/ \ L J — oo 



mi 



which is our basic expression. 

Eq. (ITTj) can be expressed more concisely. Define At\(F) (see Fig. 1) as the 
total time that the ordinate F x (r) > F i.e. Ar A (F) = J 6 (F A (r) - F)dr where 



0(x) 



if a; < 



I 1 if x > 
Also note that A t' x (F) = Ar A (F) = -5 (F A (r) 

function. Then for any function / we have 



F) where 5(x) is the Dirac 5 



J /(F) At' x (F) dF=-J /(F) 1 5 (F A (r) - F) rfrrfF = -Jf (F A (r)) dr. 



It follows that 



/OO /* OO 

(e- Q ^ (r) - l) dr = J (l - e~ aF ) At' x (F) dF, 



(12) 



so that if 



r(F)=]Tg A AT A (F), 
A 



(13) 
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we have the simple equality 

p OO Z 1 oo 

J e- aA p(A) dA = exp J (l - e~ aF ) t'(F) dF. (14) 

3 Rationale for Extrapolation 

Eq. (114p can of course be solved for p(A) in nominal closed form by applying the 
inverse Laplace transform. But a less formal path is to use (fl4l) to set up an equation 
that p(A) satisfies. For this purpose, take the logarithm of the equality ([141) and 
apply the operation —d/da to both sides, yielding 

/•OO /"OO /"OO 

/ e~ aA Ap{A)dA = e~ aF (-Fr'(F))dF e~ aA p{A)dA 
Jo Jo Jo 

= / Q{F) e~ a{F+A) P {A)dAdF 

Jo Jo ' (15) 

roo poo 

= / Q(F) e- aA p{A - F) dF dA 
Jo Jo 

where Q(F) =-Ft'(F) 
and we have used the fact that p(A) = for A < 0. Now the inverse Laplace transform 

(loosely, take the coefficient of e~ aA on both sides) establishes that 

Ap(A) = [ A Q(F) p(A - F) dF (16) 
Jo 

Our interest is in the behavior of p(A), or G(X), for small values of A, or X; since 
F < A in (3.2), this corresponds to small values of F. Now the anticipated nature 
of the pulse profiles comes into play. A pulse form of type A will be initiated (see 
Fig. 1) at some time —6a- If it is thereafter determined by any standard chemical 
kinetic sequence leading to its eventual disappearance, it will asymptotically decay 
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as C\ e axt for some a\. Hence the low amplitude level F duration will be given by 

t x {F) = -b x - — i n (F/C x ). (17) 
Consequently, we have for the total weighted duration 



t(F)= Y,qx(-h + -tnC x 



x 



;is) 



from which Q(F) of (3.1) has the constant value 

Q{F) = Q = y £q x /a x . (19) 

A 

Eq. (ITS]) , with p(F) = for F < then becomes 

Ap(A) = Q [ A p(F)dF, (20) 
Jo 

or in terms of the null measurement cumulant G(x) of (1.1), xG'(x) = QG(x), with 
the solution 

G{x) = Cx Q . (21) 

We conclude that 

£nG(x) = £nC + Q£nx, (22) 

so that a standard linear extrapolation of in G vs in x is valid at sufficiently small x 
Let us take a hypothetical example. It is that of chronic parasitic infection of 
an organism, with continual birth of clusters of parasites, each of which is quenched 
by the immune system. There is a large fluctuation in parasite load A, sampled 
sequentially in equivalent test volumes, measurable if above the threshold xq. If the 
data is acquired via null measurements of the load above virtual thresholds {x > xq}, 



we want to extrapolate the ensuing G(x) to x < xq. Choose as typical population 
spike, (with origin at r = rather than at max Fx — it makes no difference) the form 



and for definiteness, 1 < c\ < 5, 1 < a\ < 3, 1 < d\ < 5 over a period < r < 10, 
with parameters distributed uniformly in their domains, and all q\ — 1. Evaluating 
A of Eq. ([6]) for 1000 runs, the resulting £nG(x) is plotted against Inx in Fig. 2. The 
feasible linear extrapolation region is indeed very large. 

The conclusion (!22j) is not without assumptions that have been pointed out, but it 
appears to be a result of some generality, exemplifying the assertion that extrapolation 
is a model-dependent procedure, and that recognition of this fact has important 
operational significance. 




(23) 
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Figure 2: Typical Dependence of in G on in A 
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